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ABSTRACT 

We  consider  the  behavior  of  distributed  asynchronous  routing  al¬ 
gorithms  for  optimizing  the  flows  in  a  virtual  circuit  data  network, 
with  respect  to  a  given  convex  cost  function.  The  algorithms  op¬ 
erate  with  minimal  synchronization  of  computations  and  informa¬ 
tion  exchange  between  different  processors  and  consist  of  gradient 
projection  iterations  which  compute  a  target  set  of  flows  for  each 
path.  Then,  the  processors  try  to  make  the  actual  flows  equal  to 
the  target  flows,  by  appropriately  assigning  paths  to  incoming, 
new  virtual  circuits.  We  concentrate  on  the  “many  small  users’ 
case,  in  which  there  is  (on  the  average)  a  very  large  number  of 
virtual  circuits,  each  one  requiring  a  small  communication  rate. 
This  note  is  a  followup  to  our  earlier  paper  [TsBe]  and  addresses 
the  limiting  behavior  when  the  frequency  of  iteration  becomes  in¬ 
finite  relative  to  the  frequency  of  information  exchange  between 
nodes. 

I.  MODEL  DESCRIPTION. 

We  are  given  a  network  described  by  a  directed  graph  G  = 
(V,  E).  (V  is  the  set  of  nodes,  E  is  the  set  of  directed  links.)  For 
each  pair  w  =  (i,j)  of  distinct  nodes  j,  (also  called  an  origin- 
destination,  or  OD  pair),  let  Pw  be  a  given  set  of  paths  from  i  to 
j  containing  no  loops.  (These  are  the  candidate  paths  to  which 
virtual  circuits  will  be  assigned  for  transmitting  messages  from 
i  to  j.)  For  each  OD  pair  w  and  for  any  time  t,  there  will  be 
a  total  of  Nw(t)  active  virtual  circuits  linking  node  i  to  node  j. 
These  virtual  circuits  are  assigned  to  paths,  W, ,(t)  being  assigned 
to  path  p.  (Thus,  1V„(1)  =  Np(t).)  New  virtual  circuits 

for  OD  pair  vi  are  generated  according  to  a  Poisson  process,  at 
a  rate  A„/e,  where  e  is  a  small  positive  parameter.  Newly  gen¬ 
erated  virtual  circuits  are  assigned  to  a  path  p  €  Pw  and  remain 
assigned  to  that  path  during  the  entire  lifetime  of  the  virtual  cir¬ 
cuit,  which  is  an  (independent)  exponential  random  variable  with 
rate  Each  virtual  circuit  for  OD  pair  w  is  assumed  to  require 
communication  rate  e  from  each  link  in  the  path  to  which  it  is 
assigned.  (Thus,  by  letting  e  be  very  small,  we  are  at  the  “many 
small  users”  situation.) 

We  are  primarily  interested  in  the  case  where  e  is  very  small 
and  will  therefore  consider  the  asymptotic  performance  of  routing 
schemes,  as  e  — »  0.  For  this  reason,  we  prefer  to  work  with  the 
variables  r„(t)  and  xp(t)  defined  by  eNw  (t),  e7Vp(t),  respectively. 
Notice  that  the  mean  of  r„(t)  (at  steady  state)  is  equal  to  rw  — 
Aw/m<u  and  is  therefore  independent  of  e.  For  any  link  (t, j),  we 
also  define  the  flow  Fij(t)  through  that  link  to  be  equal  to  the 
sum  of  xP(f),  over  all  paths  p  which  use  link 
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We  introduce  a  cost  function  D  which  is  meant  to  penalize  con¬ 
gestion  (large  flows)  through  each  link.  In  particular,  we  assume 
the  separable  form 

s(o=  E 

<<j)  SB 

We  assume  that  each  5,-y  is  twice  continuously  differentiable  and 
its  second  derivative  is  bounded  away  of  zero.  (In  particular, 
Dij  is  strictly  convex.)  Since  the  Fy’s  are  functions  of  the  xp’s, 
we  may  rewrite  the  cost  function  in  terms  of  the  xv  variables  to 
obtain  the  form  D(x)  =  D;j(x),  where  x  is  the  vector  of 

all  ip’s.  Clearly,  each  Dij  inherits  the  convexity  property  of  the 
Dij' s. 

We  are  interested  in  routing  schemes  which  minimize 
limsupt_oc>  E(G(x(t))].  Let  D"  be  the  minimum  value  of  D(-) 
subject  to  the  constraints  that  xp  >  0,  )CpsP„  XP  ~  r">  Vti», 
Vp  €  P„ .  As  £  — ♦  0,  t  — ►  oo,  the  random  variable  r„  (t)  converges 
to  its  steady  state  mean  rw  in  the  mean  square  and  this  fact 
may  be  exploited  to  show  that  thtere  exist  routing  policies  under 
which  lime—o  limsuPt— <*,  D{D{x{t)))  <  D‘.  On  the  other  hand, 
limsupt_0O  E[D(x(t))]  >  D" ,  for  every  routing  scheme,  this  be¬ 
ing  a  consequence  of  Jensen’s  inequality.  Therefore,  an  algorithm 
for  the  deterministic  multicommodity  network  flow  problem  (for 
which  D *  is  the  optimal  value)  may  be  used  as  a  guide  for  ob¬ 
taining  an  asymptotically  (sis  e  — *  0)  optimal  routing  scheme. 

The  particular  scheme  we  propose  is  based  on  the  gradient 
projection  algorithm  for  the  above  defined  multicommodity  flow 
problem,  which  consists  of  the  following  iteration  (for  each  to) : 
Let  x„  =  (xp;  p  €  P.J)  be  the  vector  of  all  path  flows  for  a  given 
OD  pair  to.  Let  p  be  a  path  in  Pm  with  the  smallest  value  of 
(dD/dxp)(xvl).  We  then  let 

Xp  <-  max{0,xp  -  ~  ^ (=«))}>  P  5*  P  (l-a) 

-  E  xp)-  t1-6) 

r^P.  PS.P- 

Here  is  a  positive  scaling  constant,  typically  obtained  from  an 
approximation  of  the  matrix  of  second  derivatives  of  D),  and  <y 
is  a  3mall  positive  stepsize. 

In  a  realistic  data  network  as  decsribed  above,  iteration  (X) 
cannot  be  implemented  exactly  if  x„  is  to  stand  for  the  vector  of 
actual  flows  through  the  paths  p  e  Pm.  Some  of  the  reasons  are 
the  following:  a)  Due  to  the  stochastic  nature  of  the  generation 
and  extinction  of  virtual  circuits,  it  is  impossible  to  enforce  a 
desired  number  of  them,  for  each  p;  b)  The  processor  who  is  to 
execute  the  iteration  (1)  for  a  certain  OD  pair  w  may  not  have 
access  to  the  exact  current  value  of  the  derivative  of  D,  evaluated 
at  the  vector  x  of  current  path  flows;  c)  The  value  of  rw  may 
not  be  known  exactly;  in  fact  in  realistic  situations  r„  varies 
slowly  with  time  and  a  good  routing  algorithm  should  be  able  to 
track  such  changes  without  causing  flows  to  be  far  from  optimal. 
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In  the  next  paragraph  we  describe  how  iteration  (1)  would  be 
implemented  in  a  realistic  environment  so  as  to  overcome  the 
above  mentioned  difficulties. 

We  assume  that  the  processor  in  charge  of  the  OD  pair  w 
(PR*,,  for  short)  has  available  at  each  time  t  a  target  flow  xp(t) 
for  each  path  p  €  P*, .  (We  denote  by  x*,(t)  the  vector  with 
components  xp(t),  p  €  P*,.)  These  target  flows  are  updated  at 
times  t",  n  =  1,2,....  We  assume  that  for  some  scalars  5,  A, 

0<S<C+1— t"  <  A,  Vn, ui.  (2) 

Other  than  the  above  inequalities,  we  do  not  impose  any  restric¬ 
tions  on  the  sequence  {£"  },  thus  assuming  minimal  synchroniza¬ 
tion  of  the  computations  of  processors  in  charge  of  different  OD 

pairs.  Suppose  that  at  time  t" ,  processor  PR*,  has  available  es¬ 
timates  A”  of  partial  derivatives  3D/dxp,  for  each  pSP,,  evalu¬ 
ated  at  x(£*).  This  processor  is  typically  the  origin  node  for  OD 
pair  w  and  we  may  reasonably  assume  that  it  has  available  r*,  (£) 
and  xw(t),  at  each  time  t.  Then,  this  processor  evaluates  a  vector 
x%  of  target  flows  using  the  formulas 

p„  =  argmin{A"},  (3.a) 

x;  =  max{0,=p(C)-W"(A;-ApJ},  p  #  ft.,  (3.6) 

Y,  zpK)-  (*■?)' 

p*p~.  psp. 

We  assume  that  there  exist  constants  m,M>  0  such  that  0  < 
m  <  pp  <  M. 

The  estimates  A”  are  assumed  to  be  formed  as  follows:  a  pro¬ 
cessor  observing  link  (i,j)  computes,  once  in  a  while,  the  deriva¬ 
tive  of  Dij,  evaluated  at  where  t  is  the  current  time  and 

transmits  this  value  to  processor  PR*,.  Such  derivatives  (ob¬ 
tained  from  each  link  (i,j))  are  used  by  processor  PR*,  to  con¬ 
struct  the  estimate  A"  as  the  sum  of  D'^  over  all  links  (i,j)  on 
path  p.  This  estimate  would  be  exact  if  —  Fiy  (t£);  however, 

due  to  lack  of  synchronization  between  processors  and  communi¬ 
cation  delays  this  will  not  be  the  case  in  general.  Nevertheless,  if 
the  flows  in  the  network  change  slowly,  this  estimate  will  be  fairly 
accurate.  (The  above  described  scheme  may  be  generalized  by  al¬ 
lowing  the  processor  associated  with  link  (i,  j)  to  use  a  short  term 
average  of  JP* rather  the  instantaneous  value  Fij(t)  at  a  single 
time  £.)  We  assume  that  the  processors  associated  with  each  link 
evaluate  the  appropriate  derivatives  at  least  once  every  B  time 
units,  where  B  is  some  constant;  furthermore,  communication 
delays  are  also  assumed  to  be  bounded  by  B. 

It  remains  to  describe  how  processor  PR*,  assigns  incoming 
virtual  circuits  to  paths,  during  the  time  interval  (i„,t£+1).  The 
objective  is  of  course  to  make  the  actual  flows  xv  (t)  as  close  as 
possible  to  the  target  flows  x",  but  there  are  several  alternatives; 
we  present  two: 

(i)  Randomization:  Each  incoming  virtual  circuit  is  assigned  to 
path  p  with  probability  xp/r*,(t £)  and  independently  from  other 
assignments. 

(ii)  Metering:  A  new  virtual  circuit  generated  at  time  t  is  as¬ 
signed  to  the  path  p  for  which  the  value  of  xp  —  xp(t)  is  largest. 
(Ties  axe  broken  by  randomization.) 

Metering  is  generally  recommended  over  randomization  in  pra¬ 
ctice  since  it  tends  to  bring  about  more  quickly  a  close  match 
between  target  and  actual  path  flows.  On  the  other  hand,  the  use 
_  of  metering  has  an  interesting  adverse  effect  on  convergence  when 
the  frequency  of  iteration  is  very  large  relative  to  the  frequency 
of  information  exchange.  This  is  the  main  point  of  this  note,  and 
is  discussed  in  the  next  section. 

Related  Research:  The  above  described  scheme  is  moti¬ 
vated  from  an  algorithm  implemented  in  the  CODEX  network 
(see  [BeGal],  Section  5.8).  The  application  of  nonlinear  program¬ 
ming  methods  for  distributed  routing  in  data  networks  goes  back 
to  [Gaj,  The  particular  gradient  projection  method  considered 


here  was  proposed  in  [Be],  [BeGa2].  The  asynchronous  version 
of  the  gradient  projection  algorithm  was  analyzed  in  [TsBe],  fol¬ 
lowing  more  general  studies  of  asynchronous  descent  algorithms 
[TsBeAt].  In  all  these  references,  the  stochastic-nature  and  short¬ 
term  variations  of  the  input  traffic  are  ignored.  In  [GaBe]  the 
“many  small  users”  assumption  is  introduced  in  a  stochastic  set¬ 
ting  and  asymptotic  optimality  (as  e  — »  0)  is  proved  for  a  different 
class  of  routing  algorithms,  under  the  assumption  of  synchro¬ 
nism.  Finally  [Ts]  considered  the  simultaneous  effects  of  asyn- 
chronism  and  the  stochastic  nature  of  input  traffic  (under  the 
many  small  users  assumption)  thus  integrating  previous  models 
and  approaches.  The  statements  made  in  the  next  section  are 
variations  of  some  of  the  results  in  [Ts]. 

IL  DISCUSSION  OF  PERFORMANCE. 

We  discuss  separately  two  cases: 

A:  Let  us  assume  that  the  constant  8  of  inequality  (2)  is  nonzero. 
Equivalently,  there  is  a  lower  bound  on  the  time  between  consec¬ 
utive  updates  of  the  target  flows,  by  each  processor.  Then,  the 
following  result  has  been  proved  in  somewhat  different  form  in 
[Ts]:  For  any  fixed  positive  value  of  8  we  may  choose  the  stepsize 
7  small  enough  and  guarantee  that,  with  either  randomization 
or  metering,  asymptotic  optimality  is  obtained,  in  the  sense  that 
Iim«_o  limsupt_M  E[D(x(t))]  =  D\ 

The  proof  of  the  above  statement  is  quite  long  because  of  the 
technicalities  involved.  However,  the  outline  of  the  argument  is 
fairly  simple.  We  first  discard  the  possibility  that  |r„(£)  —  tv,]  is 
not  negligible,  this  being  a  low  probability  event,  for  t  large  and 
e  small.  Then,  we  choose  7  to  be,  say,  an  order  of  magnitude 
smaller  than  8.  Then,  the  difference  xp  —  xp(t")  is  very  small 
when  compared  with  the  length  of  the  time  interval  [tS,t£+1]. 
Thus,  by  the  end  of  that  time  interval,  xp(t)  is  equal  to  xp  plus 
some  random  deviation  which  is  negligible  as  e  — *  0.  Thus,  af¬ 
ter  neglecting  small  random  deviations,  we  may  safely  forget  the 
stochastic  nature  of  the  generation  and  extinction  of  virtual  cir¬ 
cuits.  We  are  therefore  in  the  setting  of  a  deterministic  asyn¬ 
chronous  gradient  projection  algorithm,  whose  convergence  to  an 
optimum  of  the  cost  function  can  be  proved  using  the  techniques 
of  [TsBe,  TsBeAtj. 

B:  We  now  consider  the  case  where  6  =  0.  This  means  that  it 
is  possible  that  one  processor  performs  an  unbounded  number 
of  iterations  before  other  processors  get  a  chance  of  performing  a 
single  iteration.  For  this  case,  it  can  be  shown  by  means  of  an  ex¬ 
ample  that  if  metering  is  employed  and  no  matter  how  7  is  chosen, 
the  algorithm  may  be  non-convergent.  The  essence  of  such  an  ex¬ 
ample  admits  a  simple  explanation.  For  asynchronous  gradient¬ 
like  algorithms  to  be  convergent  one  generally  needs'  that  the 
difference  between  true  derivatives  and  estimates  of  derivatives, 
used  in  the  computation,  is  of  the  order  of  the  stepsize  7.  This 
requires  that  the  flows  at  the  time  £"  of  an  iteration  are  not 
substantially  different  from  the  flows  that  were  used  in  the  eval¬ 
uation  of  the  derivative  estimate  A” .  This  in  turn  requires  that 
Fij (f)  —  Fij(s)  be  of  the  order  of  7,  when  £  —  a  is  of  the  order 
of  B  (which  is  the  bound  on  communication  delays).  However, 
with  metering,  even  if  x"  -  xp{t)  is  of  the  order  of  7,  still  there 
is  always  a  path  flow  xp(£)  which  changes  with  0(1)  speed.  Once 
xp(t)  reaches  x",  (which  takes  0(q)  time),  processor  PR*,  may 
compute  a  new  target  and  the  process  will  be  repeated.  Thus, 
some  path  flows  xP(t)  may  keep  moving  at  (order  of)  unit  speed 
for  a  time  interval  of  0(1)  magnitude,  before  new  derivative  in¬ 
formation  is  obtained  from  other  processors.  It  follows  that  the 
previously  mentioned  requirements  for  convergence  fail  to  hold, 
derivative  estimates  have  a  0(1)  error  and  there  is  nothing  to 
guarantee  that  the  algorithm  progresses  in  a  descent  direction. 

For  randomization,  the  situation  i3  different:  if  xp  stands  for 
a  generic  true  flow  variable  and  xp  stands  for  a  target  variable, 
then  the  mean  of  xp  satisfies  a  differential  equation  of  the  form 
dxp/dt  =  p(xp  -  xp).  Thus  if  initially  xp  -  xp  =  0(7),  which 
is  always  the  case  after  an  iteration,  E[xp(t)\  changes  with  0(7) 


speed.  The  variance  of  xv{t)  goes  to  zero,  as  e  — »  0  and  it  follows 
that  the  derivative  estimates  are  wrong  only  within  a  0(t  +  £1/2) 
factor.  By  choosing  e  and  small  enough,  the  derivative  esti¬ 
mates  are  accurate  enough  to  guarantee  that  iterations  proceed 
in  a  descent  direction  and  the  techniques  of  [TsBe]  may  be  used 
to  demonstrate  convergence,  in  a  suitable  sense.  We  should  men¬ 
tion  here  that  the  technique  employed  in  [TsBe]  requires  that  the 
true  3ow  at  the  time  of  an  iteration  is  a  convex  combination  of 
the  true  flow  at  the  previous  iteration  and  the  target  flow  com¬ 
puted  at  that  previous  iteration.  This  property  is  valid  in  this 
case,  modulo  certain  stochastic  terms  which  vanish  as  e  — ►  0;  this 
is  a  straightforward  consequence  of  the  differential  equation  de¬ 
scribing  the  evolution  of  the  mean  of  xp,  when  randomization  is 
employed. 
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